UCBN: A new audio-visual broadcast news corpus for multimodal speaker verification studies

نویسندگان

  • Girija Chetty
  • Michael Wagner
چکیده

The performance of face, voice, and multimodal speaker verification systems in complex and non-controlled scenarios, is typically lower than systems developed in highly controlled environments. With the aim to facilitate the development of robust multi-modal speaker recognition systems, a new multi-modal (audio-visual) Australian broadcast UCBN (University of Canberra Broadcast News) corpus was developed by capturing about 30 hours of television daily news program from several free-to-air Australian TV channels over a period of two years. In this paper we describe the acquisition of UCBN, and a new video preprocessing technique used for detection of newscasters and anchor person shots in news sequences. The speaker verification experiments using feature fusion of acoustic and visual speech features extracted from the mouth region are also reported. The performance of the complex UCBN database is compared with that of the controlled VidTIMIT database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker tracking in a broadcast news corpus

Speaker tracking is the process of following who says something in an audio stream. In the case the audio stream is a recording of broadcast news, speaker identity can be an important meta-data for building digital libraries. Moreover, the segmentation and classification of the audio stream in terms of acoustic contents, bandwidth and speaker gender allow to filter out portions of the signal wh...

متن کامل

Multifactor Fusion for Audio-Visual Speaker Recognition

In this paper we propose a multifactor hybrid fusion approach for enhancing security in audio-visual speaker verification. Speaker verification experiments conducted on two audiovisual databases, VidTIMIT and UCBN, show that multifactor hybrid fusion involve a combination feature-level fusion of lip-voice features and face-lip-voice features at score-level is indeed a powerful technique for spe...

متن کامل

Audio Visual Speaker Verification Based on Hybrid Fusion of Cross Modal Features

In this paper, we propose hybrid fusion of audio and explicit correlation features for speaker identity verification applications. Experiments were performed with the GMM based speaker models with a hybrid fusion technique involving late fusion of explicit cross-modal fusion features, with implicit eigen lip and audio MFCC features. An evaluation of the system performance with different gender ...

متن کامل

Spectral cross-correlation features for audio indexing of broadcast news and meetings

This paper describes the effect of three new acoustic feature parameters to detect audio source segments that are based on spectral cross-correlation: spectral stability, white noise similarity, and sound spectral shape. These parameters are devised for accurate audio source detection and are used in a pre-processing module for automatic indexing of the broadcast news and the meetings. We condu...

متن کامل

Audio-Visual Speaker Recognition for Video Broadcast News

Signi cant progress has been made in the transcription of the audio stream in the broadcast news domain for both radio news and TV news (HUB4 task). Such transcripts provide an excellent means of indexing video content for search and retrieval. Speaker identi cation is an important technology in this domain both for selecting high-accuracy speaker-dependent models for transcription and as an in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006